Score Level versus Audio Level Fusion for Voice Pathology Detection on the Saarbrücken Voice Database
نویسندگان
چکیده
The article presents a set of experiments on pathological voice detection over the Saarbrücken Voice Database (SVD). The SVD is freely available online containing a collection of voice recordings of different pathologies, both functional and organic. It includes recordings for more than 2000 speakers in which sustained vowels /a/, /i/, and /u/ are pronounced with normal, low, high, and low-high-low intonations. This variety of sounds makes possible to set different experiments, and in this paper a comparison between the performance of a system where all the vowels and intonations are pooled together to train a single model per class, and a system where a different model per class is trained for each vowel and intonation, and the scores of each subsystem are fused at the end, is conducted. The first approach is what we call audio level fusion, and the second is what we call score level fusion. For classification, a generative Gaussian mixture model trained with mel-frequency cepstral coefficients, harmonics-to-noise ratio, normalized noise energy and glottal-to-noise excitation ratio, is used. It is shown that the score level fusion is far more effective than the audio level fusion.
منابع مشابه
Automatic age detection in normal and pathological voice
Systems that automatically detect voice pathologies are usually trained with recordings belonging to population of all ages. However such an approach might be inadequate because of the acoustic variations in the voice caused by the natural aging process. In top of that, elder voices present some perturbations in quality similar to those related to voice disorders, which make the detection of pa...
متن کاملVoice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit
The paper presents a set of experiments on pathological voice detection over the Saarbrücken Voice Database (SVD) by using the MultiFocal toolkit for a discriminative calibration and fusion. The SVD is freely available online containing a collection of voice recordings of different pathologies, including both functional and organic. A generative Gaussian mixture model trained with mel-frequency...
متن کاملVoice pathology detection and classification using MPEG-7 audio low-level features
In this paper, a new pathological voice detection and pathology classification method based on MPEG-7 audio lowlevel features is proposed. MPEG-7 features are originally used for multimedia indexing, which includes both video and audio. Indexing is related to event detection, and as pathological voice is a separate event than normal voice, we show that MPEG-7 audio low-level features can do ver...
متن کاملMultiple-Feature Fusion Based Onset Detection for Solo Singing Voice
Onset detection is a challenging problem in automatic singing transcription. In this paper, we address singing onset detection with three main contributions. First, we outline the nature of a singing voice and present a new singing onset detection approach based on supervised machine learning. In this approach, two Gaussian Mixture Models (GMMs) are used to classify audio features of onset fram...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012